Subword Units for a Mandarin Keyword Spotting System
نویسنده
چکیده
This paper is concerned with the problem of phonetic modeling in a Mandarin keyword spotting system. The task is to detect 20 keywords from continuous speech in the Call Home corpus from the Linguistic Data Consortium (LDC). Different speech units are explored, including whole word, syllable, and demi-syllable (INITIAL and FINAL). In our speaker-independent HMM-based Mandarin keyword spotting experiments, the keyword spotter based on base-syllable keyword models has achieved the best performance. The best spotting accuracy achieved is 83.8% with 9.8 FA/KW/H. In the second part of our study, keyword spotting with different numbers of general filler models (389, 182, 37 and 1 fillers) has been performed in an effort to reduce computation time and increase flexibility.
منابع مشابه
Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
For languages with limited training resources, out-ofvocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for ea...
متن کاملMorphological Segmentation for Keyword Spotting
• We explore the impact of morphological segmentation on Keyword Spotting (KWS). ! • Handling out-of-vocabulary (OOV) words is a major challenge in KWS we aim to alleviate this problem by utilizing sub-word units.! • We augment a state-of-the-art KWS system with subword units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentatio...
متن کاملCross-word sub-word units for low-resource keyword spotting
We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cr...
متن کاملAn Investigation of Subword Unit Representations for Spoken Document Retrieval
This study investigates the feasibility of using subword unit representations for spoken document retrieval as an alternative to using words generated by either keyword spotting or word recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recogn...
متن کاملCoalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting
In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence t...
متن کامل